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Abstract 

Identification of falls while performing normal activities of daily living (ADL) is im¬ 
portant to ensure personal safety and well-being. However, falling is a short term 
activity that occurs rarely and infrequently. This poses a challenge for traditional su¬ 
pervised classification algorithms, because there may be very little training data for 
falls (or none at all) to build generalizable models for falls. This paper proposes an 
approach for the identification of falls using a wearable device in the absence of train¬ 
ing data for falls but with plentiful data for normal ADL. We propose three ‘X-Factor’ 
Hidden Markov Model (XHMMs) approaches. The XHMMs have ‘inflated’ output 
covariances (observation models). To estimate the inflated covariances, we propose a 
novel cross validation method to remove ‘outliers’ from the normal ADL that serves as 
proxies for the unseen falls and allow learning the XHMMs using only normal activi¬ 
ties. We tested the proposed XHMM approaches on two activity recognition datasets 
and show high detection rates for falls in the absence of fall-specific training data. We 
show that the traditional method of choosing threshold based on maximum of negative 
of log-likelihood to identify unseen falls is ill-posed for this problem. We also show 
that supervised classification methods perform poorly when very limited fall data is 
available during the training phase. 
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1. Introduction 


Identification of normal Activities of Daily Living (ADL), for e.g., walking, hand 
washing, making breakfast, etc., is important to understand a person’s behaviour, goals 
and actions [ 1J. However, in certain situations, a more challenging, useful and inter¬ 
esting research problem is to identify cases when an abnormal activity occurs, as it can 
have direct implications on the health and safety of an individual. An important abnor¬ 
mal activity is the occurrence of a fall. However, falls occur rarely, infrequently and 
unexpectedly w.r.t. the other normal ADLs and this leads to either little or no training 
data for them 0. The Centers for Disease Control and Prevention, USA 0, suggests 
that on average, patients incur 2.6 falls per person per year. Recent studies also sug¬ 
gest that even in a long term experimental set up only a few real falls may be captured 
00. In these situations with highly skewed fall data, a typical supervised activity 
recognition system may misclassify ‘fall’ as one of the already existing normal activity 
as ‘fall’ may not be included in the classifier training set. An alternative strategy is 
to build fall detection specific classifiers that assume abundant training data for falls, 
which is hard to obtain in practice. Another challenge is the data collection for falls, 
as it may require a person to actually undergo falling which may be harmful, ethically 
questionable, and the falling incidences collected in controlled laboratory settings may 
not be the true representative of falls in naturalistic settings 0. 

The research question we address in this paper is: Can we recognise falls by ob¬ 
serving only normal ADL with no training data for falls in a person independent man¬ 
ner?. We use the HMMs for the present task as they are very well-suited for sequential 
data and can model human motions with high accuracy (7). Typically, an HMM can 
be trained on normal activities and the maximum of negative of log-likelihood on the 
training data is set as a threshold to identify a fall as an outlier. However, choosing 
such a threshold may severely effect classifier’s performance due to spurious artifacts 
present in the sensor data and most of the falls may be classified as normal activities. 
In this paper, we use the outlier detection approach to identify falls and present three 
X-Factor HMM based approaches for detecting short-term fall events. The first and 
second method models individual normal activities by separate HMMs or all normal 
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activities together by a single HMM, by explicitly modelling the poses of a movement 
by each HMM state. An alternative HMM is constructed whose model parameters are 
the averages of the normal activity models, while the averaged covariance matrix is 
artificially ‘inflated’ to model unseen falls. In the third method, an HMM is trained 
to model the transitions between normal activities, where each hidden state represents 
a normal activity, and adds a single hidden state (for unseen falls) with an inflated 
covariance based on the average of covariances of all the other states. The inflation 
parameters of the proposed approaches are estimated using a novel cross-validation 
approach in which the outliers in the normal data are used as proxies for unseen fall 
data. We present another method that leverages these outliers to train a separate HMM 
as a proxy model to detect falls. We also compare the performance of one-class SVM 
and one-class nearest neighbour approach along with several supervised classification 
algorithms that use full data for normal activities but the number of falls are gradu¬ 
ally increased in the training set. We show that supervised classifiers perform worse 
when limited data for falls is available during training. This paper is a comprehensive 
extension of the work of Khan et al. GO in terms of: 

• Proposing two new models to detect unseen falls by (i) modelling transitions 
among normal activities to train an HMM and adding a new state to model un¬ 
seen falls, and (ii) training a separate HMM on only the outliers in the normal 
activities data to model unseen falls. 

• Data pre-processing, extraction of signals from raw sensor data, and number and 
type of features are different from Khan et al.liBl. 

• Studying the effect of changing the number of states on the proposed HMM 
methods for fall detection. 

• Identifying similarity through experiments between the rejected outliers from the 
normal activities and the unseen falls. 

• Additional experiments evaluating the effect of quantity of fall data available 
during the training phase on the performance of the supervised versions of the 
proposed fall detection methods and two other supervised classification methods. 
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2. Related Work 


The research in fall detection spans over two decades with several recent papers 
mmm that discuss different methodologies, trends and ensuing challenges using 
body worn, ambient or vision based fall detection techniques. Several research works 
in fall detection are based on thresholding techniques EO or supervised classification 
|2l . One of the major challenges in fall detection is the less availability of fall data f5l ; 
therefore, such techniques are difficult to use in practice. Keeping this view in mind, 
we survey techniques that attempt to detect falls by employing generative models, out¬ 
lier/anomaly detection and one-class classification lfl2l based techniques that only use 
data from normal activities to build the model and identify a fall as an anomaly or 
outlier. 

Thome et al. ED present a Hierarchical HMM (HHMM) approach for fall detec¬ 
tion in video sequences. The HHMMs first layer has two states, an upright standing 
pose and lying. They study the relationship between angles in the 3D world and their 
projection onto the image plane and derive an error angle introduced by the image for¬ 
mation process for a standing posture. Based on this information, they differentiate 
other poses as ‘non-standing’ and thus falls can be distinguished from other motions. 
A two-layer HMM approach, SensFall on, is used to identify falls from other normal 
activities. In the first layer, the HMM classifies an unknown activity as normal verti¬ 
cal activity or ‘other’, while in second stage the ‘other’ activity is classified as either 
normal horizontal activity or as a fall. Tokumitsu et al. m present an adaptive sensor 
network intrusion detection approach by human activity profiling. They use multiple 
HMMs for every subject in order to improve the detection accuracy and consider the 
fact that a single person can have multiple patterns for the same activity. The data is 
collected using infra-red sensors. A new sequence of activity is fed to all the HMMs 
and likelihoods are computed. If all the likelihoods calculated from corresponding 
HMMs are not greater than pre-determined thresholds, then an anomaly is identified. 
Cheng et al. Ifl6l present a fall detection algorithm based on pattern recognition and 
human posture analysis. The data is collected through tri-axial accelerometer embed¬ 
ded in the smartphones and several temporal features are computed. HMM is employed 
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to filter out noisy character data and to perform dimensionality reduction. One-class 
SVM (OSVM) is applied to reduce false positives, followed by a posture analysis to 
counteract the missed alarms until a desired accuracy is achieved. 

Zhang et al. El trained an OSVM from positive samples (falls) and outliers from 
non-fall ADL and show that the falls can be detected effectively. Yu et al. m pro¬ 
pose to train Fuzzy OSVM on fall activity captured using video cameras and to tune 
parameters using fall and some non-fall activities. Their method assigns fuzzy member¬ 
ship to different training samples to reflect their importance during classification and is 
shown to perform better than OSVM. Popescu fl9l presents a fall detection technique 
that uses acoustic signals of normal activities for training and detects fall sounds from 
it. They train OSVM, one-class nearest neighbour (OCNN) classifier and One-class 
GMM classifier (that uses a threshold) to train models on normal acoustic signals and 
find that OSVM performs the best; however, it is outperformed by its supervised coun¬ 
terpart. Medrano et al. 0 propose to identify falls using a smartphone as a novelty 
from the normal activities and found that OCNN performs better than OSVM but is 
outperformed by supervised SVM. 

The supervised and thresholding techniques for fall detection collect artificial fall 
data in a laboratory under non-naturalistic settings; however, such fall data may not 
be true representative of actual falls and learning with them may lead to over-fitting. 
To overcome the need for a sufficient set of representative ‘fall’ samples, we propose 
three ‘X-Factor’ HMM based approaches to identify falls across different people while 
learning models only on data from normal activities. 

3. Proposed Fall Detection Approaches 

The problem we investigate in this paper pertains to activity recognition and the 
datasets we use capture the temporal activities performed by humans. The Hidden 
Markov Models (HMM) are effective in modelling the temporal dynamics in data se¬ 
quences and consider the history of actions when taking a decision on the current se¬ 
quence. The HMM is a doubly stochastic process for modelling generative sequences 
that can be characterized by an underlying process generating an observable sequence. 
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Formally, an HMM consists of the following components ED: 

• N - the number of hidden states in the HMM. The hidden states can be connected 
in several ways, for example in left-to-right manner or fully interconnected (er- 
godic). the set of states can be denoted as S = {Si, S 2 , ■ ■ ■, Sn} and the state 
at time t as q t . 

• M - The number of distinct observation symbols per state that corresponds to 
the physical output of the system being modelled. The symbols can be denoted 
as V = {vi,V 2 , ■ ■ ■, vj vf}. When the observation is continuous, M = 00 , and 
can be approximated using Gaussian or mixture of Gaussian with mean and co- 
variance corresponding to each hidden state as the underlying parameters. 

• A - The state transition probability distribution A = a,ij, where a l;i represents 
the probability of state j following state i and is expressed as: 


aij = P[qt +1 = Sj\q t = S»] 1 < i,j < N 


(1) 


The coefficients of state transition have the following properties: 


N 



The state transition matrix A is independent of time. For the ergodic design 
where any state can reach any other state a tJ > 0 for all i and j, whereas for 
other topologies one or more values will have = 0. 

• B - The observation symbol probability distribution in state j, B = {bj{k)}, 
where 


bj(k ) = P[vk at t\q t = Sj] 1 < j < N, 1 < k < M (2) 


• 7 r - The initial state distribution 7r = { 7 ^}, where 


7Tj = P[q ! = Si] 1 < i < N 


(3) 
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The complete set of parameters of an HMM can also be compactly represented as 

EQ: 

A = (tt ,A,B) (4) 

A pictorial representation of a 3 state discrete HMM is shown in Figure |T| The 
model follows a Markovian assumption, i.e., the current state at time t is independent 
of all states t — 2 ,..., 1 given the state at t — 1 and an independence assumption, i.e., 
the output observation at time t is independent of all the previous observations and 
states given the current state. 

HMMs are successfully used in detection of human activities with high accuracy 
Qj. Mannini and Sabatini ll22l compare various single-frame classifiers against HMM 
based sequential classifier for activity recognition using on-body accelerometers and 
report superior performance of the HMM classifiers. Typically, two approaches are 
commonly applied to model human actions and activities using HMMs 0: 

(i) Modelling Poses : Train an HMM for an activity by explicitly modelling the poses 
of a movement by each state, or 

(ii) Modelling Activities: Train an HMM for different activities by modelling each 
activity by a single state. 



Figure 1: Discrete HMM with 3 states and 3 possible outputs 
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We consider both of these approaches to propose ‘X-Factor’ based models to iden¬ 
tify falls when their training data is not available, which is discussed next. 

3.1. Pose HMM 

The traditional method to detect unseen abnormal activities is to model each nor¬ 
mal activity using an HMM (by modelling the poses of a movement by each state), 
compare the likelihood of a test sequence with each of the trained models and if it is 
below a pre-defined threshold for all the models then identify it as an anomalous activ¬ 
ity 0. For fall detection, we model each normal activity i by an ergodic HMM which 
evolves through a number of k states. The observations oj(t) in state j are modelled 
by a single Gaussian distribution. Each model i is described by the set of parameters, 
A* = {7 r*, Ai, (pij, Sjj)}, where 7 r* is the prior, A, t is the transition matrix, and /i,, and 

are the mean and covariance matrix of a single Gaussian distribution, , X 7 ; ? ), 

giving the observation probability Pr(oi\j) for the j th HMM state. This method es¬ 
timates the probability that an observed sequence has been generated by each of the 
i models of normal activities. If this probability falls below a threshold T t for each 
HMM, a fall is detected. Typically, an HMM is trained for each normal activity on 
the full training data and the individual activity threshold is set as the maximum of the 
negative log-likelihood of the training sequences (we call this method as HMM 1). If a 
new activity’s negative log-likelihood is below each of these thresholds, it is identified 
as a fall. 

Quinn et al. l23l present a general framework based on Switched Linear Dynami¬ 
cal Systems for condition monitoring of a premature baby receiving intensive care by 
introducing the ‘X-factor’ to deal with unmodelled variation from the normal events 
that may not have been seen previously. This is achieved by inflating the system noise 
covariance of the normal dynamics to determine the regions with highest likelihood 
which are far away from normality based on which events can be classified as ‘not 
normal’. We extend this idea to formulate an alternate HMM (we call this approach as 
XHMMl) to model unseen fall events. This approach constructs an alternate HMM 
to model fall events by averaging the parameters of i HMMs and increasing the aver¬ 
aged covariances by a factor of £ such that each state’s covariance matrix is expanded. 
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Thus, the parameters of the X-Factor HMM will be Xxhmmi = {tt, A. ft. £E)}, where 
tt. A, ft, and X are the average of the parameters 7 r*, A it p,; and E t of each i HMMs. 
Each of the i HMMs is trained on non-fall data obtained after removing outliers from 
the normal activities and these outliers serve as the validation set for optimizing the 
value of £ using cross validation (see details in Section |4j. For a test sequence, the 
log-likelihood is computed for all the HMM models (i HMMs representing i normal 
activities and the alternate HMM representing fall events) and the one with the largest 
value is designated as its class label. 

3.2. Normal Pose HMM 

Another method to identify abnormal activities is to model all the normal activities 
together using a single HMM and if a test sequence’s likelihood falls below a prede¬ 
fined threshold, it is identified as anomalous ll24l . For fall detection, we group all the 
normal activities together and train a single HMM; where normal poses are modelled 
by each state. The idea is to learn the ‘normal concept’ from the labelled data. This 
method estimates the probability that the observed sequence has been generated by this 
common model for all the normal activities and if this probability falls below a thresh¬ 
old T, a fall is detected. Typically the maximum of negative log-likelihood on the 
training data is set as a threshold to detect unseen falls (we call this method HMM2). 
Similar to XHMM1, we propose to construct an alternative HMM to model the ‘fall’ 
activities whose parameters ( Xxhmmi ) remain the same as the HMM to model non¬ 
fall activities together (A) except for the covariance, whose inflated value is computed 
using cross validation (we call this method (XHMM 2); see details in Section^. For 
a test sequence, the log-likelihood is computed for both HMM models (HMM repre¬ 
senting non-fall activities and the alternate HMM representing fall events) and the one 
with the larger value is designated as its class label. 

The intuition behind XHMM1 and XHMM2 approaches is that if the states 
representing non-fall activities are modelled using Gaussian distributions, then the fall 
events coming from another distribution can be modelled using a new Gaussian (X- 
factor) with larger spread but with the same mean as non-fall activities. The obser¬ 
vations that are closer to the mean retain high likelihood under the original Gaussian 
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distribution for the normal activities, whereas the X-factor will have higher likelihood 
for observations that are far away from the normal activities. To simplify the assump¬ 
tions about unseen falls, other extra factors such as the mean and the number of states 
are not introduced in the proposed approaches. 

3.3. Activity HMM 

Smyth |[25l addresses the problem of real-time fault monitoring, where it is dif¬ 
ficult to model all the unseen fault states of a system and proposes to add a (j + 1) 
novel hidden state (in an HMM) to cover all other possible states not accounted by the 
known j states. The novel state’s prior probability is kept same as other known states 
and the density of the observable data given the unknown state is defined by using 
non-informative Bayesian priors. For detecting falls, we train a single HMM to model 
transitions of normal activity sequences, with parameters, \xhmm3 = {tt, A, fi, X}, 
where each hidden state represents a normal activity, and add an extra hidden state to 
the model; its means and covariances are estimated by averaging the means and covari¬ 
ances of all other states representing the normal activities. The X-factor is introduced 
to vary the covariance of this novel state by a factor of £, which can be determined us¬ 
ing cross validation (see Section[4]). Adding a novel state to the existing HMM means 
adding a row and column to A to represent transitions to and from the state captur¬ 
ing unseen fall. However, this information is not available apriori. For fault detection 
application, Smyth f25l designs a 3 state HMM and added a novel 4 th state to model 
unknown anomalies and chooses the probability of remaining in the same state as 0.97 
and distributes transition to other states uniformly. We use similar idea to choose proba¬ 
bility of 0.95 to self transitions to fall events and the rest of the probability is uniformly 
distributed for transitions from fall events to normal activities. For transitions from 
different normal activities to falls, a probability of 0.05 is set (to capture the assump¬ 
tion that falls occur rarely) and the transition probabilities between different normal 
activities are scaled such that the total probability per row in the matriix A sums up to 
1 . Viterbi decoding ED is employed on a test sequence to find the most likely hidden 
state that generated it, if it consists of the novel state, the sequence is classified as a fall 
or else a normal activity. 
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3.4. HMM Norm out 


As discussed in Section 3.1 and 3.2 some outliers are rejected from each of the 
normal activities that may arise due to artifacts in the sensor readings or mislabelling 
of training data. These rejected sensor readings from each normal activity are grouped 
together and two HMMs are trained, one each for non-fall activities and outlier activ¬ 
ities. We call this approach as HMMNormOut■ The HMM model leamt on outliers 
activities may not be the true representative for falls but it can model those activities 
that are non-falls. 


4. Threshold Selection and Proxy Outliers 

As discussed in Section [T] falls occur rarely and infrequently compared to normal 
activities; therefore, it is difficult to get labelled data for them. This may result in 
situations with abundant data for normal activities and none for falls. To detect falls 
using traditional HMM approaches (HMM 1 and IIMM2), typically, a threshold is 
set on the likelihood of the data given an HMM trained on this ‘normal’ data. This 
threshold is normally chosen as the maximum of negative log-likelihood [24], and can 
be interpreted as a slider between raising false alarms or risking missed alarms ED. a 
major drawback of this approach is that it assumes that the data for each normal activity 
is correctly labelled and sensor readings are non-spurious. This assumption can be 
detrimental for fall detection performance; any abnormal sensor reading or mislabelling 
of training data can alter this threshold and adversely effect the performance. For the 
proposed approaches, another challenge is to estimate the parameter £ for XHMM 1, 
XHMM 2 and XHMM3 in the absence of fall data during the training phase. 

To address the above mentioned issues and finding appropriate £, we propose to 
use the deviant sequences ( outliers ) within the ‘normal’ data. The idea is that even 
though the ‘normal’ data may not contain any falls, it may contain sensor readings that 
are spurious, incorrectly labelled or significantly different. These outliers can be used 
to set £ that are required for fall detection, thereby serving as a proxy for the fall data 
in order to learn the parameter £ of the three XHMMs. To find the outliers, we use the 
concept of quartiles of a ranked set of data values that are the three points that divide the 


11 




data set into four equal groups, where each group comprises of a quarter of the data. 
Given the log-likelihoods of sequences of training data for an HMM and the lower 
quartile (Qi), the upper quartile (Q:>j and the inter-quartile range (IQR = Q 3 — Q\), 
a point P is qualified as an outlier if 

P > Q 3 + to x IQR || P <Q x -ujx IQR (5) 


where u> represents the percentage of data points that are within the non-extreme limits. 
Based on oj, the extreme values of log-likelihood that represent spurious training data 
can be removed, that leads to the 


1. Creation of a validation set comprising of outliers (proxies for falls), and 

2. Computation of parameter £ for the proposed XHMM approaches. 

Figure [ 2 ] (a) shows the log-likelihood log Pr{0\\ runn i ng ) for 1262 equal length 
(1.28 seconds) running activity sequences of the DLR dataset (see Section |5T| . Fig- 
ure[2](b) shows a box plot with the quartiles and the outliers (shown as +) for w = 1.5. 
Figure[2](c) shows the same data as in Figure[2ja) but with the outliers removed. 
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Figure 2: Log-Likelihoods (a) before and (c) after outlier removal, (b) shows box-plot 
of the quartiles for this data and the outliers for w = 1.5 


We employ an internal cross-validation to train the three XHMMs using only the 
non-fall data. We first split the normal data into two sets: ‘non-fall’ data and ‘outlier’ 
data (see Figure [3j. We do this using Equation [5] with a parameter u that is manually 
set and only used for this initial split. For each activity, an HMM is trained on full 
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Figure 3: Cross Validation Scheme 


normal data and based on u>, ‘outliers’ are rejected from them and the remaining data 
is considered as ‘non-fall’. To optimize the covariance parameter, £, we use a A'-fold 
cross validation: the HMMs are trained on ()' h of the ‘non-fall’ data, and tested 
on (-T)of the ‘non-fall’ data and on all the ‘outlier’ data. This is done K times and 
repeated for different values of £. The value of £ that gives the best averaged perfor¬ 
mance metric (see Section [575] ) over -folds is chosen as the best parameter. Then, 
each classifier is re-trained with this value of parameter on the ‘non-fall’ activities. 


5. Experimental Design 

5.1. Datasets 

The proposed fall detection approaches are evaluated on the following two human 
activity recognition datasets. 

1. German Aerospace Center (DLR) (26): This dataset is collected using an Inertial 
Measurement Unit with integrated accelerometer, gyroscope and 3D magnetometers 
with sampling frequency of 100 Hz. The dataset contains samples taken from 19 
people under semi-natural conditions. The sensor was placed on the belt either 
on the right/left side of the body or in the right pocket in different orientations. 
The dataset contains 7 activities: standing, sitting, lying, walking (up/downstairs, 
horizontal), running/jogging, jumping and falling. One subject did not perform fall 
activity and its data is omitted from the analysis. 

2. MobiFall (MF) (27|: This dataset is collected using a Samsung Galaxy S3 device 
equipped with 3D accelerometer and gyroscope. The mobile device was placed 
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in a trouser pocket in random orientations. Mean sampling of 87 Hz is reported 
for accelerometer and 200 Hz for the gyroscope. The dataset is collected from 11 
subjects; eight normal activities are recorded in this dataset: step-in car, step-out 
car, jogging, jumping, sitting, standing, stairs (up and down joined together) and 
walking. Four different types of falls are recorded - forward lying, front knees lying, 
sideward lying and back sitting chair. Different types of falls are joined together to 
make one separate class for falls. Two subjects only performed fall activity and their 
data is removed from the analysis. 

The DLR dataset is collected in semi-naturalistic settings; therefore, the ratio of 
falls to normal activities is quite small « 0.0032 (26576 normal activities segments 
and 84 fall segments), whereas in the MF dataset this ratio is « 0.0899 (5430 normal 
activities and 488 fall segments). 

5.2. Data Pre-Processing 

For the MF dataset, the gyroscope sensor has a different sampling frequency than 
the accelerometer and their time-stamps are also not synchronized; therefore, the gyro¬ 
scope readings are interpolated to synchronize them with the accelerometer readings. 
Although the calibration matrix for the DLR data is available to rotate the sensor read¬ 
ings to the world frame, in our experiments we did not use it because it did not improve 
the results. For the MF dataset, orientation information is present but incorporating it 
led to the deterioration of results. This observation is consistent with the work of de la 
Vega et al. Il28l that suggest that activities can be detected without considering the ori¬ 
entations. Winter f29l suggests that for the walking activity, 99.7% of the signal power 
was contained in the lower seven harmonics (below 6Hz), with evidence of higher- 
frequency components extending up to the 20 th harmonic. Beyond that frequency, the 
signal had the characteristics of ‘noise’, which can arise from different sources, such as 
electronic/sensor noise, spatial precision of the digitization process, and human errors. 
Therefore, for both the datasets, the sensor noise is removed by using a 1 st order Butter- 
worth low-pass filter with a cutoff frequency of 20Hz. The signals are segmented with 
50% overlapping windows, where each window size is 1.28 seconds for DLR dataset 
and 3 seconds for MF dataset to simulate a real-time scenario with fast response. The 
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reason that DLR dataset does not have the same windows size as MF dataset is that it 
contains short duration fall events. Therefore, when the window size is increased to 
3 seconds, fall samples could not be extracted for many subjects and cross-validation 
across different subjects (see Section |53| may not work as desired. 

5.3. Feature Extraction 

The literature on feature extraction from motion sensors is very rich i30ll3l , 3211. 

Most of the feature extraction techniques involve computing time domain, frequency 
domain, and statistical features from the sensor readings. We extract the following five 
signals from each of the datasets: 

1. Three acceleration readings a x ,a y , a z along the x, y and ;j directions, 

2. Norm of acceleration, a norm = + a y + all and gyroscope, 0J nO rm = \j ul 'x + w y + uj z , 

where uj x , u y and u> z are the angular velocities in the x, y or z direction. 

Considering three separate acceleration signals will be useful in obtaining direction 
specific information, whereas the norm of acceleration and gyroscope will be useful in 
extracting orientation-invariant information. One objective of this study is to identify 
low-cost features that are highly discriminative in identifying various types of normal 
activities. Therefore, we extract 31 standard time and frequency domain features from 
these signals (as shown in Table[l]along with their description). Features are computed 
for each window for XHM M3. To extract temporal dynamics for HMM 1, IIM M2, 
XHMM1, XHMM2 and HMMjsiormOut, each window is sub-divided into 16ms 
frames and features are computed for each frame. 

5.4. HMM Modelling 

For all the HMM based fall detection methods discussed in the paper, the obser¬ 
vation model uses single Gaussian distribution, diagonal covariance matrix is used for 
each of the HMMs and the upper and lower values are constraint to 100 and 0.01 during 
the training. For optimizing the parameters £, a 3-fold internal cross validation is used. 

For all the HMMs methods except XHMM3 , the following procedure is adopted: 
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#Features 

Type of feature 

Reason to Use 

/i - fs 

Mean of O-e , (Zy, CZ^ ? ®norm ? ^norm I126H 

Average features are used for the detection 

of body positions 031. These feature work 
well in identifying various ADL l26l 

o 

1 

Maximum value of CZy, ^normi W norm 11261 

These feature work well in identifying var¬ 
ious ADL l26j 

/ll - /l5 

Minimum value of fz#, zzy, cz^, CL n0 rrm ^ norm 11261 

These feature work well in identifying var¬ 
ious ADL l26l 

/l6 - /20 

Standard Deviation of a x ,a y ,a z ,a norm ,u} norrn 

m 

Variance feature is used for estimating the 

intensity of an activity 031. These feature 
work well in identifying various ADL l26l 

/21 - ^*22 

IQR of (Lnormi ^ norm 11261 

These feature work well in identifying var¬ 
ious ADL l26l 

/23 

Normalized Signal Magnitude Area ll34l 

This is useful to identify dynamic and 

static activities, e.g., running or walking 

versus lying or standing. 

/24 

Normalized Average Power Spectral Density of 

® norm 

This feature is useful for the detection of 

cyclic activities, e.g., walking, running, cy¬ 
cling f33|. 

/25 

Spectral Entropy of a norm 1331 

This is useful for differentiating between 

activities involving locomotion. 

/26 

DC component after FFT of a norm 031 

The is shown to result in accurate recogni¬ 
tion of certain postures and activities. 

/27 

Energy, i.e., sum of the squared discrete FFT 

component magnitudes of a norm 1331 

This is shown to result in accurate recogni¬ 
tion of certain postures and activities. 

/ 2 S 

Normalized Information Entropy of the Discrete 

FFT component magnitudes of a norm 031 

This helps in discriminating activities with 

different energy values. 

/29 - /31 

Correlation between a x ,a y , a z P6l 

This helps in differentiating among activ¬ 
ities that involve translation in one dimen¬ 
sion, e.g., walking and jogging from taking 

the stairs up and down. 


Table 1: Extracted Features and their Description. 
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• Each activity in the HMMs is modelled with 2/4/8 states, where each individual 
state represents functional phases of the gait cycle m or the ‘key poses’ of each 
activity. 

• Five representative sequences per activity are manually chosen to initialize the pa¬ 
rameters. Initialization is done by segmenting a single sequence into equal parts 
(corresponding to the number of states) and computing fi, :) and E , j for each part and 
further smoothing by BW with 3 iterations. 

• The transition matrix A, is ergodic (i.e. every state has transitions to other states) 
and initialized such that transition probabilities from one state to another are 0.025, 
self-transitions are set accordingly (25), and the actual values are learned by BW 
algorithm following initialization. 

• The prior probabilities of each state, n, are initialized to be uniformly distributed (to 
sum across all states to 1) and further learned during BW. 

• The likelihood for a test sequence is computed using the forward algorithm OTl and 
the classification decisions are taken based on them. 

For XHMM3, the parameters /i :i and E ; and transition matrix are computed from 
the annotated data and no additional BW step is used. When a novel state is added, 
its parameters are estimated by averaging the means and covariances of all other states 
(with covariance further inflated using X-Factor) and transition matrix is re-adjusted 
(refer to Section pTT) . The prior probabilities of each state is kept uniform. The decision 
to detect a fall is taken using the Viterbi algorithm ED, which finds the most likely 
hidden state that produces the given observation. 

5.5. Performance Evaluation and Metric 

To evaluate the performance of the proposed approaches for fall detection, we per¬ 
form leave-one-subject-out cross validation (FOOCV) (38), where only normal activi¬ 
ties from (N — 1) subjects are used for training and the N th subject’s normal activities 
and falls are used for testing. This process is repeated N times and the average per¬ 
formance metric is reported. This evaluation is person independent and demonstrates 
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the generalization capabilities as the subject who is being tested is not included in 
training the classifiers. The different values of £ used in internal cross validation for 
XHMMl, XHMM2 and XHMM3 are [1.5, 5,10,100]. The value of u is set to 
1.5 for obtaining outliers from the normal activities. 

Conventional performance metrics such as accuracy, precision, recall, etc., may 
not be very useful when classifiers are expected to observe a skewed distribution of 
fall events w.r.t. normal activities. We use the geometric mean ( gmean ) l39l as the 
performance metrics because it measures the accuracies separately on each class, i.e., it 
combines True Positive Rates ( TPR ) and True Negative Rates ( TNR ) and is given by 
gmean = \JTPR * TNR. An important property of gmean is that it is independent 
of the distribution of positive and negative samples in the test data. We also use two 
other performance metrics, fall detection rate (FDR) (or the true positives) and false 
alarm rate (FAR) (or the false positives) to better understand the performance of the 
proposed fall detection classifiers. A fall detection method that gives high gmean, high 
FDR and low FAR is considered to be better than others. 

6. Results 

In this section we present the fall detection results using the DLR and MF datasets. 
In the first experiment, the models are learned using only the normal activities and 
falls are shown during testing phase only. In the second experiment, we assume the 
presence of few falls in the training set to build supervised models on both falls and 
normal activities and test the performance of these models. In the third experiment, we 
test our hypothesis that outliers from normal activities are similar to falls or not. 

6.1. Training without fall data 

In this experiment, we compare the performance of the fall detection methods dis¬ 
cussed in Section [3] HAT Ml and HM AT 2 are trained on full ‘normal’ data, while 
the proposed three XHMMs are trained on ‘non-fall’ data, but they make use of full 
‘normal’ data to optimize their respective parameters. We also compare the results with 
One-Class SVM (OSVM) fl2l and One-class nearest neighbour (OCNN) |[40l that are 
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trained on only the full ‘normal’ data. The OS VM method has an built-in mechanism to 
reject fraction of positive samples (;/) to help deciding the class boundary in the absence 
of data from the negative class. We set this parameter to a default value of v = 0.5 
and implemented OSVM using MATLAB BT1 . The OSVM uses a gaussian kernel by 
default for one-class learning. For OCNN, we keep the value of k-nearest neighbours 
to be 1. For the HMM based methods, except for XHMM3 where the number of 
states equals the number of labelled normal activities plus an additional state for mod¬ 
elling falls, the number of states are varied for all other fall detection methods to study 
the change in performance by increasing the complexity of the models. The number of 
states tested are 2, 4 and 8 for both the data sets. We observe that increasing the number 
of states do not significantly improve the performance of any methods. Though large 
number of states increase the training time for the models significantly. For a given 
fixed length sequence (for both the DLR and COV datasets), training a 8 state HMM 
takes almost two times longer than a 4 state HMM, which in turn takes almost twice to 
train a 2 state HMM. We choose 4 states HMM as the optimum for this and subsequent 
experiments because it provides a good trade-off between accuracy and running time. 

Tables [2] shows the performance of the different fall detection methods in the ab¬ 
sence of training data for falls on both the datasets. We observe that for both the DLR 
and MF datasets, HMM 1 and HMM 2 failed to detect any (or most of the) falls. For 
DLR dataset, XHMM 3, and XHMM1 show the highest gmean in comparison to 
other methods. HMM Norrn o v ,t performs worse than the three XHMMs but better 
than HMAIs. XHMM2 has the highest FDR but at the cost of high FAR. Both 
OCNN and OSVM perform worse than the proposed XHMM methods. OCNN identi¬ 
fied most of the falls at the cost of large number of false alarms and OSVM missed to 
detect most of the falls. For the MF dataset, XHMM2 performs the best, XHMM1 
and XHMMS did not perform well because they classify most falls as step-in car and 
sitting. The reason for their poor performance is that the fall signals collected in this 
dataset contain sensor readings after the subject has hit the ground. Therefore, the fall 
data has some stationary values after the falling action has occurred. After creating 
overlapping windows, some of them may contain stationary values that are likely to be 
classified as one of the static activities. OCNN and OSVM perform worse with high 
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gmean with Standard Deviation bars 


falls detection rate but with large false alarms rate. 


Method 

DLR 

MF 

gmean 

FDR 

FAR 

gmean 

FDR 

FAR 

HMM1 

0 

0 

0.001 

0.092 

0.016 

0.005 

HMM2 

0 

0 

o.ooo: 

0 

0 

0.002 

XHMMl 

0.854 

0.822 

0.096 

0.290 

0.094 

0.024 

XHMM2 

0.784 

0.965 

0.360 

0.810 

0.978 

0.298 

XHMM3 

0.925 

0.893 

0.030 

0.516 

0.285 

0.059 

HMMjsf ormOut 

0.326 

0.500 

0.731 

0.515 

0.399 

0.244 

OCNN 

0.380 

0.959 

0.846 

0.308 

0.736 

0.867 

OSVM 

0.163 

0.117 

0.394 

0.652 

0.879 

0.508 


Table 2: Performance of Fall Detection methods (4 states). For XHMM3 
(#states=#labelled activities + 1 state for unseen fall). 
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(a) DLR dataset 


(b) MF dataset 


Figure 4: gmean with error bars across all subjects for DLR and MF datasets 


To understand the statistical stability of the proposed methods, we plot the mean 
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values of gmean along with error bars (see Figure |4| representing standard deviation. 
Figure [4] shows that for both the DLR and MF dataset, all the proposed XHMM meth¬ 
ods outperform HMM 1, HAIAI 2 and HMAlxormOut- Due to skewed distribution 
of falls in both the datasets, the standard deviation for the gmean could be higher 
because a small number of misclassifications can vary the gmean greatly. This exper¬ 
iment shows that training HMMs on full ‘normal’ data for detecting unseen falls, and 
setting a threshold as the maximum of negative log-likelihood on training sequences is 
not the right approach and better models can be built when outliers from the ‘normal’ 
datasets are removed and covariances of the X-Factor based HMMs are optimized. 

6.2. Feature Selection 

Selecting relevant features from a large set of features extracted from wearable sen¬ 
sors have shown to improve results for activity recognition ll42l . A major challenge in 
performing feature selection in the proposed problem of fall detection is that the fall 
data is not available during the training time; therefore, relevant features are to be se¬ 
lected from the non-fall data. We used the RELIEF-F feature selection method lf42l 
for our task. RELIEF-F computes a weight for each feature in terms of how well they 
distinguish between the data points of the same and different classes that are near to 
each other. This method provides a ranking of features in order of their merit for clas¬ 
sification. We choose the top 10 and top 20 features and train the fall detection models 
discussed earlier with these reduced sets of features to study their effect on identify¬ 
ing unseen falls. The top selected features are mostly the mean, maximum, minimum, 
standard deviation, correlation, percentile and Signal Magnitude Area (see Table [3j. 
Tables [4] and [5] show that for both the DLR and MF datasets, reducing the number of 
features to 20 from 31 decrease the performance of XHAIM 1 and XHMM 3 but 
increase the performance of XHMM2 and HMM^ormOut ■ When the number of 
features are reduced to the top 10, the performance of all the classifiers deteriorates 
for the DLR and MF dataset (except for XHMM3). OCNN and OSVM performs 
worse in comparison to the XHMM methods. The degradation of performance can 
arise because feature selection is based on the normal activities only, instead of based 
on both falls and normal activites. This experiment shows that feature selection can 
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improve the pei 

formance ol 

Datasets 

Top Ranked Features 


Rank 1-10 

Rank 11 — 20 


DLR 

/3 5/4 5/23 5/5 5/19 5 

/l4,/9,/20>/l0>/l3 

/7>/8>/l5>/22,/30> 

/ 18 ,/ 31 ,/ 6 ,/ 29 ,/n 


MF 

/2-/29-/31-/30'/3> 

/ll 5/19 5/13 5/22 5/7 

/9^/4./s>/l7>/20> 

/ 18 -/ 5 ./ 6 -/ 23./12 


Table 3: Top 10/20 ranked features. Compare with Table |Tj 


Method 

20 Features 

10 Features 

gmean 

FDR 

FAR 

gmean 

FDR 

FAR 

HMM 1 

0 

0 

0 

0.080 

0.045 

0 

HMM2 

0 

0 

0.0001 

0 

0 

0 

XHMM1 

0.415 

0.271 

0.018 

0.192 

0.107 

0.042 

XHMM2 

0.852 

0.933 

0.213 

0.832 

0.933 

0.248 

XHMM3 

0.425 

0.288 

0.063 

0.333 

0.209 

0.079 

HMMjsf ormOut 

0.786 

0.921 

0.317 

0.771 

0.783 

0.217 

OCNN 

0.368 

0.926 

0.851 

0.420 

0.879 

0.783 

OSVM 

0.237 

0.203 

0.501 

0.053 

0.039 

0.553 


Table 4: Performance of Fall Detection methods on reduced features for DLR dataset 
(Compare with Tables|2| 


6.3. Training with fall data 

In this experiment, we compare several supervised classification algorithms for fall 
detection under two scenarios (a) when full data for falls is available, and (b) when 
small amount of fall data is available during training and is gradually increased. The 
latter experiment simulates a scenario when we may have few fall data to begin with. 
We simulate this scenario by supplying a controlled amount of fall data during the train¬ 
ing phase and train the supervised classifiers by randomly choosing 1,2,4, 6, 8,10,25, 
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Method 

20 Features 

10 Features 

gmean 

FDR 

FAR 

gmean 

FDR 

FAR 

HMMl 

0.093 

0.020 

0.007 

0 

0 

0.005 

HMM2 

0.106 

0.022 

0.002 

0 

0 

0.005 

XHMM1 

0.051 

0.008 

0.004 

0.046 

0.006 

0.005 

XHMM2 

0.829 

0.957 

0.239 

0.785 

0.763 

0.185 

XHMMS 

0.531 

0.333 

0.110 

0.685 

0.542 

0.109 

H M Mjy ormOut 

0.759 

0.774 

0.163 

0.566 

0.453 

0.127 

OCNN 

0.303 

0.686 

0.861 

0.324 

0.695 

0.842 

OSVM 

0.579 

0.717 

0.516 

0.658 

0.933 

0.508 


Table 5: Performance of Fall Detection methods on reduced features for MF dataset 
(Compare with Tables|2j> 

and 50 falls samples from the full fall data. To avoid classification bias due to ran¬ 
dom choice of fall data, we run this experiment 10 times (per LOOCV fold) and re¬ 
port the average value of the performance metrics. We use supervised version of the 
XHMMs presented earlier. HMMl sup is similar to XHMMl, where each normal 
activity is modelled by a separate HMM by utilizing full ‘normal’ data for each activ¬ 
ity; however, due to the presence of fall data a separate HMM is trained for fall events. 
HMM2 sup is similar to XHMM2, where the full ‘normal’ activities are modelled by 
a general HMM and a separate HMM is trained to model falls. HMM3 sup is similar 
to XHMM2>\ however, in this case a state representing ‘actual’ fall activity is added 
in the HMM and its parameters are computed from the labelled fall data. The other 
two supervised classifiers we use are Random Forest ( RF ) and Support Vector Ma¬ 
chine ( SVM ). The ensemble size in RF is set to 200, where each decision (or split) 
in each tree is based on a single, randomly selected feature 0. For SVM classifier, a 
Gaussian kernel is used with width equals to 10. 

Table [6] shows the LOOCV results for both the datasets when full training data is 
available for falls and all the normal activities. For the MF dataset, the performance 
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(a) DLR dataset 



(b) MF dataset 

Figure 5: Effect of varying the amount of fall data in supervised learning. Two best 
performing X-Factor approaches are showp^in the y-axis corresponding to zero training 
data (compare with Table |2li. 











































































Method 

DLR 

MF 

gmean 

FDR 

FAR 

gmean 

FDR 

FAR 

HMMl sup 

0.768 

0.719 

0.054 

0.489 

0.259 

0.038 

HMM2 sup 

0.601 

0.533 

0.087 

0.925 

0.939 

0.084 

H MM3 sup 

0.938 

0.908 

0.021 

0.969 

0.988 

0.045 

RF 

0.622 

0.496 

0.001 

0.962 

0.937 

0.012 

SVM 

0.929 

0.885 

0.015 

0.985 

0.994 

0.025 


Table 6: Supervised Fall Detection with full training data for falls and all normal activ¬ 
ities (Compare with Table [2]). 


improvements in all the XHMM based classifiers in comparison to their counter¬ 
parts that are trained in the absence of falls. For the DLR dataset, performance of 
HMMl sup and HMM2 sup is worse than when no training data for falls is used, 
whereas HMM3 sup show improvement with equivalent performance as SVM. The 
RF classifier gives intermediate results. Figures [5a] and 5b show the performance of 
supervised classifiers when the number of fall data is gradually increased during the 
training phase for the DLR and MF dataset. All the supervised classifiers perform 
worse when the training data for falls is very small. Figure [5a] shows that as the num¬ 
ber of samples in the training data for falls increase, HMM3 sup and SVM starts to 
perform better than other classifiers but provides equivalent performance to XHMM3 
(shown by • on the y-axis representing no training data for falls). The performance of 
XHMM 3, which requires no fall data for training is much better than its supervised 
counterpart (HMM3 sup ) when a small number of training samples for falls is avail¬ 
able. Figure 5b shows that the performance of HMM2 sup starts to improve when 
some fall data are added in the training set for MF dataset, whereas other classifiers 
perform worse with limited training samples for falls. XHMM2 and HMM2 sup 
with small number of training samples for falls show comparable performance. As the 
number of fall samples increase in the training set, HMM3 sup and SVM outperform 
other methods. 
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Both the experiments on the DLR and MF datasets suggest that the performance of 
supervised classifiers improve as the number of fall samples increase in the training set. 
However, when they are trained on very limited fall data, their performance is worse in 
comparison to the proposed models that did not observe falls before. The results from 
the study of Stone and Skubic |5] show that only 9 actual falls were obtained over a 
combined nine years of continuous activity data in a realistic setting, which highlights 
the rarity of fall occurrence and consequently the difficulty in training supervised clas¬ 
sifiers on abundant fall data. Moreover, supervised methods cannot handle training the 
classifiers in the absence of falls, whereas the proposed X-factor approaches can learn 
in the absence of training data for falls and identify them with high gmean and FDR. 

6.4. Are outliers representative of proxy for falls? 

Section[4]assumes that the outlier sequences present in the normal activities can be 
used as a proxy for falls to estimate the parameters £. We conduct an experiment to 
evaluate the validity of this assumption. We use the supervised HMMs (HMMl sup 
and HMM2 sup ), with the only difference that they are trained on ‘non-fall’ activity 
(i.e. obtained after removing outliers from the normal data) and falls. During the testing 
phase we present the ‘outliers’ to the classifier instead of normal and fall data. The idea 
is that some of the outliers that are rejected by the normal activities will be classified 
as falls as they differ from the normal activities or the general non-fall concept due to 
inadvertent sensor artifacts. 

HMMl sup . When using HMMl sup , for the DLR dataset, the outliers of normal 
activities ‘Jumping’ and ‘Running’ are most of the time classified as ‘Falls’, the outliers 
from the activities ‘Walking’ and ‘Lying’ are sometimes classified as falls, whereas 
outliers from ‘Sitting’ and ‘Standing’ are mostly classified as non-falls. This provides 
evidence that some of the short term dynamic activities can have variations and may 
not be identified correctly in their respective classes. Similar experiments on the MF 
dataset show that only the step-in car activity’s outliers are classified as falls and the 
rest of the outliers of other ‘non-fall’ activities are classified as non-falls. 
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H MM2 sup. When using HMM 2 sup , for the MF dataset, the outliers are mostly 
classified as falls and for the DLR dataset, they are classified as non-falls. 

Based on the above experiments, we can conclude that in the absence of fall data 
during training, rejected outliers from the normal activities can be used as a proxy 
for falls, provided they are very different from the samples of normal activities or the 
general concept of normal activity. However, it is to be noted that since these rejected 
outliers are not actual falls and only some of them are similar to falls. 

7. Conclusions and Future Work 

The lack of sufficient data for falls can adversely affect the performance of super¬ 
vised fall detection classifiers. Moreover, the supervised classification methods cannot 
handle the realistic scenario when no training data for falls is available. In this pa¬ 
per, we present three ‘X-factor’ HMM based fall detection approaches that learn only 
from the normal activities captured from a body-worn sensor. To tackle the issue of 
no training data for falls, we introduced a new cross-validation method based on the 
inter-quartile range of log-likelihoods on the training data that rejects spurious data 
from the normal activities, treats them as proxies for unseen falls and helps in optimiz¬ 
ing the model parameter. The results showed that two of the XHMM methods show 
high detection rates for falls in person and placement of sensor independent manner. 
We showed that the traditional method of thresholding with HMM on full normal data 
set as maximum of negative log-likelihood to identify unseen falls is not the right ap¬ 
proach for this problem. We also showed that supervised classifiers performed poorly 
with few training samples for falls, whereas in comparison the proposed methods show 
high performance in the absence of training data for falls. An important extension of 
the proposed techniques is the realization of an online fall detection system, which can 
begin with X-factor models as initial representative model for unseen falls and incre¬ 
mentally adapts its parameters as it starts identifying some falls. 
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